Exploring Data: Describing patterns and departures from patterns.
Exploratory analysis of data makes use of graphical and numerical techniques to study patterns and departures from patterns. Emphasis should be placed on interpreting information from graphical and numerical displays and summaries.
A. Constructing and interpreting graphical displays of distributions of
univariate data (dotplot, stemplot, histogram, cumulative frequency plot)
1. Center and spread
2. Clusters and gaps
3. Outliers and other unusual features
4. Shape
B. Summarizing distributions of univariate data
1. Measuring center: median, mean
2. Measuring spread: range, interquartile range, standard deviation
3. Measuring position: quartiles, percentiles, standardized scores (z-scores)
4. Using boxplots
5. The effect of changing units on summary measures
C. Comparing distributions of univariate data (dotplots, back-to-back stemplots, parallel boxplots)
1. Comparing center and spread: within group, between group variation
2. Comparing clusters and gaps
3. Comparing outliers and other unusual features
4. Comparing shapes
D. Exploring bivariate data
1. Analyzing patterns in scatterplots
2. Correlation and linearity
3. Least-squares regression line
4. Residual plots, outliers, and influential points
5. Transformations to achieve linearity: logarithmic and power transformations
E. Exploring categorical data
1. Frequency tables and bar charts
2. Marginal and joint frequencies for two-way tables
3. Conditional relative frequencies and association
4. Comparing distributions using bar charts
n a given set of data, identify the individuals and variables; identify each variable as categorical or quantative.
In a given set of data, describe the overall pattern by giving numerical measures of center and spread and a description of shape.
Given a graphical distribution, is the distribution symmetric or skewed?
Given a set of data, which measures of center and spread are more appropriate: the mean and standard deviation or the fine-number summary?
What is the area under a density curve?
Given a normal distribution with a stated mean and standard deviation, calculate the proportion of values above the given number, below the given number, and between two given numbers. Find the point with a stated proportion of all values above it.
Given a set of data, what is the correlation coefficient? Is there a positive or negative association, a linear pattern? Are there any outliers?
How do you find the slope and intercept of the least-squares regression line from the means and standard deviation of x and y and their correlation??
How does one calculate the residuals? How does the graph of the residuals allow one to determine if the data is truly a linear relationship?
Given a two-way table, find the marginal and conditional distributions.
Given a set of data, determine if an exponential or power function is a better fit.
Given an exponential or power function, perform a logarithmic transformation. What does this transformation indicate?
Lab activity
Workshop Statistics
Correlation project
Graphing calculator Quiz
Unit assessment
TEST
Unit II
Sampling and Experimentation: Planning and conducting a study.
Data must be collected according to a well-developed plan if valid information on a conjecture is to be obtained. This plan includes clarifying the question and deciding upon a method of data collection and analysis.
A. Overview of methods of data collection
1. Census
2. Sample survey
3. Experiment
4. Observational study
B. Planning and conducting surveys
1. Characteristics of a well-designed and well-conducted survey
2. Populations, samples, and random selection
3. Sources of bias in sampling and surveys
4. Sampling methods, including simple random sampling, stratified random sampling, and cluster sampling
C. Planning and conducting experiments
1. Characteristics of a well-designed and well-conducted experiment
2. Treatments, control groups, experimental units, random assignments, and replication
3. Sources of bias and confounding, including placebo effect and blinding
4. Completely randomized design
5. Randomized block design, including matched pairs design
D. Generalizability of results and types of conclusions that can be drawn from observational studies, experiments, and surveys
Given a set of data from a survey, what is the population, sample? What are the explanatory and response variables?
Is a given study observational or experimental?
Given a study, outline a randomized design of an appropriate experiment.
How would you use random digits to simulate a given study?
Lab activities: Jelly blubbers, jumping frogs, etc
Workshop Statistics
Activity-Based Statistics
Graphing calculator quiz
Unit assessment
TEST
Unit III
Anticipating Patterns: Exploring random phenomena using probability and simulation.
Probability is the tool used for anticipating what the distribution of data should look like under a given model.
A. Probability
1. Interpreting probability, including long-run relative frequency interpretation
2. Law of Large Numbers concept
3. Addition rule, multiplication rule, conditional probability, and independence
4. Discrete random variables and their probability distributions, including binomial and geometric
5. Simulation of random behavior and probability distributions
6. Mean (expected value) and standard deviation of a random variable, and linear transformation of a random variable
B. Combining independent random variables
1. Notion of independence versus dependence
2. Mean and standard deviation for sums and differences of independent random variables
C. The normal distribution
1. Properties of the normal distribution
2. Using tables of the normal distribution
How are the probability rules used to determine the probabilities of defined events?
How are the addition and multiplication rules for the union of two events used to solve problems?
Given two distributions, compare the two distributions using probability histograms, means, and standard deviations.
How do you calculate the mean and variance of a discrete random variable and the expected payout in a game of chance?
What are the conditions that identify a random variable as binomial?
What are the conditions that identify a random variable as geometric?
Given a sample or experiment, how do you describe the bias and variability of a statistic in terms of the mean and spread of the sampling distribution?
Given an SRS of size n with a given mean and standard deviation of a population, how is the mean and standard deviation of a sample mean calculated? What effect does the increase of the sample size have on the standard deviation?
Lab activities
Workshop Statistics Activity-Based Statistics Graphing calculator Quiz
Lab activity
?Workshop Statistics? Graphing Calculator Quiz
Unit assessment
TEST
Unit IV
Statistical Inference: Estimating population parameters and testing hypotheses.
Statistical inference guides the selection of appropriate models.
A. Estimation (point estimators and confidence intervals)
1. Estimating population parameters and margins of error
2. Properties of point estimators, including unbiasedness and variability
3. Logic of confidence intervals, meaning of confidence level and confidence intervals, and properties of confidence intervals
4. Large sample confidence interval for a proportion
5. Large sample confidence interval for a difference between two proportions
6. Confidence interval for a mean
7. Confidence interval for a difference between two means (unpaired and paired)
8. Confidence interval for the slope of a least-squares regression line
B. Tests of significance
1. Logic of significance testing, null and alternative hypotheses; p-values; one- and two-sided tests; concepts of Type I and Type II errors; concept of power
2. Large sample test for a proportion
3. Large sample test for a difference between two proportions
4. Test for a mean
5. Test for a difference between two means (unpaired and paired)
6. Chi-square test for goodness of fit, homogeneity of proportions, and independence (one- and two-way tables)
7. Test for the slope of a least-squares regression line
What is meant by ?95%
confidence??
Given a normal distribution with a known standard deviation, how is a confidence interval calculated?
What is the sample size required to obtain a confidence interval of a
specified margin of error when given the confidence level? Given a study, what is the null and alternate hypothesis? What test of significance
would be appropriate? What assumptions must be met to use the given statistical procedure?
What effect does increasing the size of the sample have on the power when the significance level remains fixed?
What is a Type I error? What is a Type II error?
Given a problem, is inference about the mean or comparing two means necessary?
Given a study, is a one- sample, matched pairs, or two-sample procedure needed? Explain your choice.
Given a problem, is inference about a proportion or comparing two proportions necessary?
How is the confidence interval for a population proportion calculated?
How is a test of significance for the hypothesis about a population proportion done? Between proportions in two distinct populations?
Under what circumstances is a chi-square test appropriate to use?
What conditions have to be met to use the chi- square distribution?
What is the null hypothesis that the chi-square statistic tests in a two-way table?
If a test is significant, what are the most important deviations between the observed and expected counts?
Table of Contents
Unit I
Exploring Data: Describing patterns and departures from patterns.
A. Constructing and interpreting graphical displays of distributions of
univariate data (dotplot, stemplot, histogram, cumulative frequency plot)
1. Center and spread
2. Clusters and gaps
3. Outliers and other unusual features
4. Shape
B. Summarizing distributions of univariate data
1. Measuring center: median, mean
2. Measuring spread: range, interquartile range, standard deviation
3. Measuring position: quartiles, percentiles, standardized scores (z-scores)
4. Using boxplots
5. The effect of changing units on summary measures
C. Comparing distributions of univariate data (dotplots, back-to-back stemplots, parallel boxplots)
1. Comparing center and spread: within group, between group variation
2. Comparing clusters and gaps
3. Comparing outliers and other unusual features
4. Comparing shapes
D. Exploring bivariate data
1. Analyzing patterns in scatterplots
2. Correlation and linearity
3. Least-squares regression line
4. Residual plots, outliers, and influential points
5. Transformations to achieve linearity: logarithmic and power transformations
E. Exploring categorical data
1. Frequency tables and bar charts
2. Marginal and joint frequencies for two-way tables
3. Conditional relative frequencies and association
4. Comparing distributions using bar charts
In a given set of data, describe the overall pattern by giving numerical measures of center and spread and a description of shape.
Given a graphical distribution, is the distribution symmetric or skewed?
Given a set of data, which measures of center and spread are more appropriate: the mean and standard deviation or the fine-number summary?
What is the area under a density curve?
Given a normal distribution with a stated mean and standard deviation, calculate the proportion of values above the given number, below the given number, and between two given numbers. Find the point with a stated proportion of all values above it.
Given a set of data, what is the correlation coefficient? Is there a positive or negative association, a linear pattern? Are there any outliers?
How do you find the slope and intercept of the least-squares regression line from the means and standard deviation of x and y and their correlation??
How does one calculate the residuals? How does the graph of the residuals allow one to determine if the data is truly a linear relationship?
Given a two-way table, find the marginal and conditional distributions.
Given a set of data, determine if an exponential or power function is a better fit.
Given an exponential or power function, perform a logarithmic transformation. What does this transformation indicate?
Workshop Statistics
Correlation project
Graphing calculator Quiz
Unit assessment
TEST
Unit II
Sampling and Experimentation: Planning and conducting a study.
A. Overview of methods of data collection
1. Census
2. Sample survey
3. Experiment
4. Observational study
B. Planning and conducting surveys
1. Characteristics of a well-designed and well-conducted survey
2. Populations, samples, and random selection
3. Sources of bias in sampling and surveys
4. Sampling methods, including simple random sampling, stratified random sampling, and cluster sampling
C. Planning and conducting experiments
1. Characteristics of a well-designed and well-conducted experiment
2. Treatments, control groups, experimental units, random assignments, and replication
3. Sources of bias and confounding, including placebo effect and blinding
4. Completely randomized design
5. Randomized block design, including matched pairs design
D. Generalizability of results and types of conclusions that can be drawn from observational studies, experiments, and surveys
Is a given study observational or experimental?
Given a study, outline a randomized design of an appropriate experiment.
How would you use random digits to simulate a given study?
Workshop Statistics
Activity-Based Statistics
Graphing calculator quiz
Unit assessment
TEST
Unit III
Anticipating Patterns: Exploring random phenomena using probability and simulation.
A. Probability
1. Interpreting probability, including long-run relative frequency interpretation
2. Law of Large Numbers concept
3. Addition rule, multiplication rule, conditional probability, and independence
4. Discrete random variables and their probability distributions, including binomial and geometric
5. Simulation of random behavior and probability distributions
6. Mean (expected value) and standard deviation of a random variable, and linear transformation of a random variable
B. Combining independent random variables
1. Notion of independence versus dependence
2. Mean and standard deviation for sums and differences of independent random variables
C. The normal distribution
1. Properties of the normal distribution
2. Using tables of the normal distribution
How are the addition and multiplication rules for the union of two events used to solve problems?
Given two distributions, compare the two distributions using probability histograms, means, and standard deviations.
How do you calculate the mean and variance of a discrete random variable and the expected payout in a game of chance?
What are the conditions that identify a random variable as binomial?
What are the conditions that identify a random variable as geometric?
Given a sample or experiment, how do you describe the bias and variability of a statistic in terms of the mean and spread of the sampling distribution?
Given an SRS of size n with a given mean and standard deviation of a population, how is the mean and standard deviation of a sample mean calculated? What effect does the increase of the sample size have on the standard deviation?
Workshop Statistics Activity-Based Statistics Graphing calculator Quiz
Lab activity
?Workshop Statistics? Graphing Calculator Quiz
Unit assessment
TEST
Unit IV
Statistical Inference: Estimating population parameters and testing hypotheses.
A. Estimation (point estimators and confidence intervals)
1. Estimating population parameters and margins of error
2. Properties of point estimators, including unbiasedness and variability
3. Logic of confidence intervals, meaning of confidence level and confidence intervals, and properties of confidence intervals
4. Large sample confidence interval for a proportion
5. Large sample confidence interval for a difference between two proportions
6. Confidence interval for a mean
7. Confidence interval for a difference between two means (unpaired and paired)
8. Confidence interval for the slope of a least-squares regression line
B. Tests of significance
1. Logic of significance testing, null and alternative hypotheses; p-values; one- and two-sided tests; concepts of Type I and Type II errors; concept of power
2. Large sample test for a proportion
3. Large sample test for a difference between two proportions
4. Test for a mean
5. Test for a difference between two means (unpaired and paired)
6. Chi-square test for goodness of fit, homogeneity of proportions, and independence (one- and two-way tables)
7. Test for the slope of a least-squares regression line
confidence??
Given a normal distribution with a known standard deviation, how is a confidence interval calculated?
What is the sample size required to obtain a confidence interval of a
specified margin of error when given the confidence level? Given a study, what is the null and alternate hypothesis? What test of significance
would be appropriate? What assumptions must be met to use the given statistical procedure?
What effect does increasing the size of the sample have on the power when the significance level remains fixed?
What is a Type I error? What is a Type II error?
Given a problem, is inference about the mean or comparing two means necessary?
Given a study, is a one- sample, matched pairs, or two-sample procedure needed? Explain your choice.
Given a problem, is inference about a proportion or comparing two proportions necessary?
How is the confidence interval for a population proportion calculated?
How is a test of significance for the hypothesis about a population proportion done? Between proportions in two distinct populations?
Under what circumstances is a chi-square test appropriate to use?
What conditions have to be met to use the chi- square distribution?
What is the null hypothesis that the chi-square statistic tests in a two-way table?
If a test is significant, what are the most important deviations between the observed and expected counts?
Workshop Statistics Activity-Based Statistics Graphing calculator Quiz
Project
Unit assessment